Bayesian Extraction of Minimal SCFG Rules for Hierarchical Phrase-based Translation

نویسندگان

  • Baskaran Sankaran
  • Gholamreza Haffari
  • Anoop Sarkar
چکیده

We present a novel approach for extracting a minimal synchronous context-free grammar (SCFG) for Hiero-style statistical machine translation using a non-parametric Bayesian framework. Our approach is designed to extract rules that are licensed by the word alignments and heuristically extracted phrase pairs. Our Bayesian model limits the number of SCFG rules extracted, by sampling from the space of all possible hierarchical rules; additionally our informed prior based on the lexical alignment probabilities biases the grammar to extract high quality rules leading to improved generalization and the automatic identification of commonly re-used rules. We show that our Bayesian model is able to extract minimal set of hierarchical phrase rules without impacting the translation quality as measured by the BLEU score.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical Back-off Modeling of Hiero Grammar based on Non-parametric Bayesian Model

In hierarchical phrase-based machine translation, a rule table is automatically learned by heuristically extracting synchronous rules from a parallel corpus. As a result, spuriously many rules are extracted which may be composed of various incorrect rules. The larger rule table incurs more disk and memory resources, and sometimes results in lower translation quality. To resolve the problems, we...

متن کامل

Left-to-Right Hierarchical Phrase-based Machine Translation

Hierarchical phrase-based translation (Hiero for short) models statistical machine translation (SMT) using a lexicalized synchronous context-free grammar (SCFG) extracted from word aligned bitexts. The standard decoding algorithm for Hiero uses a CKY-style dynamic programming algorithm with time complexity O(n3) for source input with n words. Scoring target language strings using a language mod...

متن کامل

Using Features from Topic Models to Alleviate Over-Generation in Hierarchical Phrase-Based Translation

In hierarchical phrase-based translation systems, the grammars (SCFG rules) have over-generation problem because we can replace the non-terminalX with almost everything without knowing the syntactic or semantic role ofX . In this paper, we present an approach that uses topic models to learn the distributions for non-terminals in each SCFG rule, based on which we further derive static features f...

متن کامل

Utilizing Target-Side Semantic Role Labels to Assist Hierarchical Phrase-based Machine Translation

In this paper we present a novel approach of utilizing Semantic Role Labeling (SRL) information to improve Hierarchical Phrasebased Machine Translation. We propose an algorithm to extract SRL-aware Synchronous Context-Free Grammar (SCFG) rules. Conventional Hiero-style SCFG rules will also be extracted in the same framework. Special conversion rules are applied to ensure that when SRL-aware SCF...

متن کامل

Hierarchical Phrase-Based Grammar Extraction in JoshuaSuffix Arrays and Prefix Trees

While example-based machine translation has long used corpus information at run-time, statistical phrase-based approaches typically include a preprocessing stage where an aligned parallel corpus is split into phrases, and parameter values are calculated for each phrase using simple relative frequency estimates. This paper describes an open source implementation of the crucial algorithms present...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011